1

DOI: 10.1201/9781003355205-1

C h a p t e r 1

Sequencing and

Raw Sequence Data

Quality Control

1.1  NUCLEIC ACIDS

Nucleic acids are the chemical molecules that every living organism must have. They carry

information that directs biological activities in cells and determines the inherited charac-

teristics of the living organism. The two main kinds of nucleic acids are deoxyribonucleic

acid (DNA) and ribonucleic acid (RNA). DNA is the master blueprint for life or the book of

life, and it constitutes the genetic material in prokaryotic and eukaryotic cells and virions.

The RNA is the main genetic material of the RNA viruses, but it is found in other organisms

as molecules transcribed by DNA to play important biological roles such as protein syn-

thesis and gene regulation. The set of the DNA particles in both prokaryotic and eukary-

otic cells is called the genome. RNA is the genome of only some viruses (RNA viruses).

A nucleic acid (DNA/RNA) is a polymer made up of four building blocks called nucleo-

tides. A molecule of the nucleotide consists of (i) a sugar molecule (either deoxyribose in

DNA or ribose in RNA) attached to a phosphate group and (ii) a nitrogen-containing base

called nucleobase. In general, the nucleic acid sequence is made up of four nucleotides dis-

tinguished from one another only by the nitrogen-containing bases (Adenine (A), Cytosine

(C), Guanine (G), and Thymine (T) in the DNA molecule and Adenine (A), Cytosine (C),

Guanine (G), and Uracil (U) in the RNA molecule). Those four nucleobases are divided

into pyrimidine and purine bases. Pyrimidine bases include cytosine, thymine, and uracil;

they are aromatic heterocyclic organic compound with a single ring. Purine bases include

adenine and guanine which have two heterocyclic ring structures. A DNA molecule exists

in the form of two complementary strands (forward and reverse) that wind around each

other forming a double-helix structure. The two strands are held together by hydrogen

bonds formed between the bases (adenine is a base pair of thymine (A/T), and cytosine is a